Goto

Collaborating Authors

 model building process


The rise of AutoCV: Why AutoCV will be a game changer?

#artificialintelligence

The variety of tasks that it is solving now and will continue to solve in the next 5 or 10 years will also increase exponentially. As such systems evolve, they naturally become more intuitive, normal and easy to use for the public. Once considered highly mechanical machines with open roof tops and bicycle wheels, resembling more to a horse-cart (than a car as we think in modern terms), which only a particular section of people could afford, not just due to the cost but also the fuel availabilities, now has become highly cheap, comfortable, fuel efficient and safer. Electric cars are even replacing the brilliantly engineered mechanical components into more simpler ones that require minimum maintenance and are much safer for the environment. And let's not get started on autonomous cars.


Exciting new GitHub features powering machine learning

#artificialintelligence

Again, this is in a browser! For kicks and giggles, I wanted to see if I could run the full blown model building process. For context, I believe notebooks are great for exploration but can become brittle when moving to repeatable processes. Eventually MLOps requires the movement of the salient code to their own scripts modules/scripts. If you sneak a peek above, you will see a notebooks folder and then a folder that contains the model training Python files.


5 machine learning skills you need in the cloud

#artificialintelligence

Machine learning and AI continue to reach further into IT services and complement applications developed by software engineers. IT teams need to sharpen their machine learning skills if they want to keep up. Cloud computing services support an array of functionality needed to build and deploy AI and machine learning applications. In many ways, AI systems are managed much like other software that IT pros are familiar with in the cloud. But just because someone can deploy an application, that does not necessarily mean they can successfully deploy a machine learning model.


The Four Maturity Levels Of ML Production Systems - AI Summary

#artificialintelligence

Like many ML practitioners, I started my ML journey with Kaggle competitions. But the comfortable setup of Kaggle, where you are handed largely clean data along with features and labels, could not be further from the reality of today's ML practitioner. In fact, the model building process itself is merely a small fraction of the work that needs to be done when developing an ML solution and deploying and maintaining it in production. It is useful to speak about ML production systems in terms of various degrees of maturity, where the least mature systems are one-off models, and the most mature systems run on autopilot, updating themselves with minimal human intervention. Here, I make a broad categorization of ML systems into four levels of increasing maturity, and discuss some of the challenges involved at each level. Disclaimer: given the choice of medium (a blog post, not a book chapter), this list will certainly be incomplete, and I didn't intend it to be.


Fixing Bias in AI Systems by Building Better AI Models

#artificialintelligence

AI models are as good as the algorithms and data they are trained on. When an AI system fails, it is usually due to three factors; 1) the algorithm has been incorrectly trained, 2) there is bias in the system's training data, or 3) there is developer bias in the model building process. The focus of this article is on the bias in training data and the bias that is coded directly into AI systems by model developers. "I think today, the AI community at large has a self-selecting bias simply because the people who are building such systems are still largely white, young and male. I think there is a recognition that we need to get beyond it, but the reality is that we haven't necessarily done so yet."


5 machine learning skills you need in the cloud

#artificialintelligence

Machine learning and AI continue to reach further into IT services and complement applications developed by software engineers. IT teams need to sharpen their machine learning skills if they want to keep up. Cloud computing services support an array of functionality needed to build and deploy AI and machine learning applications. In many ways, AI systems are managed much like other software that IT pros are familiar with in the cloud. But just because someone can deploy an application, that does not necessarily mean they can successfully deploy a machine learning model.


Build your first Machine Learning pipeline using scikit-learn!

#artificialintelligence

For building any machine learning model, it is important to have a sufficient amount of data to train the model. The data is often collected from various resources and might be available in different formats. Due to this reason, data cleaning and preprocessing become a crucial step in the machine learning project. Whenever new data points are added to the existing data, we need to perform the same preprocessing steps again before we can use the machine learning model to make predictions. This becomes a tedious and time-consuming process!


Garbage In, Garbage Out: Automated Machine Learning Begins with Quality Data

#artificialintelligence

It's no secret that machine learning methods are highly dependent on the quality of the data they receive as input. If you think of machine learning as a manufacturing process, the higher the quality of the input data, the more likely it is that the final product is of high quality as well. This relationship presents a big challenge to analytics teams when it comes to figuring out the right data for helping to solve business problems. It is necessary for those teams is to prepare all datasets to achieve a machine learning process free of errors. This involves setting up quality standards and fixing data issues like missing values or columns with low statistical variance, as well as selecting the right data types, removing duplicate data, and more.


Exploratory Data Analysis (Non-Visual)

#artificialintelligence

I get asked many times "How can I do a good Exploratory Data Analysis (EDA) so that I get the necessary information for feature engineering and building machine learning model?" In this and the next post, I hope to get the question answered. I will NOT claim my process is the best but I hope as more people come into the field, they can use my process as a basis for better EDA and build better models. There are two main benefits of doing EDA and these benefits will reap benefits through the model building process. I will discuss EDA in two posts, non-visual (mainly through simple calculations) and visual.


Ask a Data Scientist: The Bias vs. Variance Tradeoff - insideBIGDATA

#artificialintelligence

Welcome back to our series of articles sponsored by Intel – "Ask a Data Scientist." Once a week you'll see reader submitted questions of varying levels of technical detail answered by a practicing data scientist – sometimes by me and other times by an Intel data scientist. Think of this new insideBIGDATA feature as a valuable resource for you to get up to speed in this flourishing area of technology. If you have a big data question you'd like answered, please just enter a comment below, or send an e-mail to me at: daniel@insidehpc.com. This week's question is from a reader who wants an explanation of the "bias vs. variance tradeoff in statistical learning."